Locally Non-linear Embeddings for Extreme Multi-label Learning

نویسندگان

  • Kush Bhatia
  • Himanshu Jain
  • Purushottam Kar
  • Prateek Jain
  • Manik Varma
چکیده

The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications. This paper develops the X1 classifier to address both limitations. The main technical contribution in X1 is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X1 to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X1 can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X1 can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Local Embeddings for Extreme Multi-label Classification

The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches attempt to make training and prediction tractable by assuming that the training label matrix is low-rank and reducing the effective number of labels by projecting the high dimens...

متن کامل

Leveraging Distributional Semantics for Multi-Label Learning

We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning s...

متن کامل

Deep Extreme Multi-label Learning

Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves 2 possible label sets when the label dimension L is very large, e.g., in millions for Wikipedia labels. This paper is motivated to better explore the label space by building and modeling an explicit labe...

متن کامل

Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings

We present a scalable Bayesian multi-label learning model based on learning lowdimensional label embeddings. Our model assumes that each label vector is generated as a weighted combination of a set of topics (each topic being a distribution over labels), where the combination weights (i.e., the embeddings) for each label vector are conditioned on the observed feature vector. This construction, ...

متن کامل

Non-Linear Smoothed Transductive Network Embedding with Text Information

Network embedding is a classical task which aims to map the nodes of a network to lowdimensional vectors. Most of the previous network embedding methods are trained in an unsupervised scheme. Then the learned node embeddings can be used as inputs of many machine learning tasks such as node classification, attribute inference. However, the discriminant power of the node embeddings maybe improved...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1507.02743  شماره 

صفحات  -

تاریخ انتشار 2015